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Abstract. The ubiquity of modular structure in real- world complex networks is being 
the focus of attention in many trials to understand the interplay between network 
topology and functionality. The best approaches to the identification of modular 
structure are based on the optimization of a quality function known as modularity. 
However this optimization is a hard task provided that the computational complexity 
of the problem is in the NP-hard class. Here we propose an exact method for reducing 
the size of weighted (directed and undirected) complex networks while maintaining 
invariant its modularity. This size reduction allows the heuristic algorithms that 
optimize modularity for a better exploration of the modularity landscape. We compare 
the modularity obtained in several real complex-networks by using the Extremal 
Optimization algorithm, before and after the size reduction, showing the improvement 
obtained. We speculate that the proposed analytical size reduction could be extended 
to an exact coarse graining of the network in the scope of real-space renormalization. 



PACS number: 89.75 



Submitted to: New J. Phys. 



I Author to whom any correspondence should be addressed 



Size reduction of complex networks preserving modularity 



2 



1. Introduction 

The study of the community structure in complex networks is becoming a classical 
subject in the area because several aspects of the problem are both challenging and 
interesting. The challenge comes from the difficulty for unveiling the best partition of 
the network in terms of communities, in the sense of groups of nodes that are more 
intraconnected rather than interconnected between them [1]. The interest comes from 
the fact that this level of description could help to elucidate an organization of the 
network prescribed by functionalities [2], [3], and also because it resembles the coarse 
graining process in statistical physics to describe systems at the mesoscale. 

The most successful solutions to the community detection problem, in terms of 
accuracy and computational cost required, are those based in the optimization of a 
quality function called modularity proposed by Newman [Ij that allows the comparison 
of different partitioning of the network. Given a network partitioned into communities, 
being Ci the community to which node i is assigned, the mathematical definition of 
modularity is expressed in terms of the weighted adjacency matrix Wj^, that represents 
the value of the weight in the link between i and j (0 if no link exists), and the strengths 
Wi = ^ Wij as [5J 

3 

Q = i;T.T.{^^3-^)m,c,), (1.1) 

where the Kronecker delta function 6{Ci, Cj) takes the values, 1 if nodes i and j are into 
the same community, otherwise, and the total strength 2w = ^ i(7j = ^ ^ Wij. 

i i j 

The modularity of a given partition is then the probability of having edges falling 
within groups in the network minus the expected probability in an equivalent (null case) 
network with the same number of nodes, and edges placed at random preserving the 
nodes' strength. The larger the value of modularity the best the partitioning is, because 
more deviates from the null case. Several authors have attacked the problem proposing 
different optimization heuristics [6l [71 [H [U [TOl [11] since the number of different partitions 
are equal to the Bell or exponential numbers, which grow at least exponentially in the 
number of nodes N . Indeed, optimization of modularity is a NP-hard (Non-deterministic 
Polynomial-time hard) problem [T3] . 

The definition of modularity can be also extended, preserving its semantics in terms 
of probability, to the scenario of weighted directed networks as follows: 

Q = ^ E E [^^3 - j c,) , (1.2) 

where wf^^ and are respectively the output and input strengths of nodes i and j 
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Y^Wij, (1.3) 
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wf = E^^.' (1-4) 



Size reduction of complex networks preserving modularity 



3 



and the total strength is 

2^ = E<'^' = E< = EE^^.- (1-5) 

i j i j 

The input and output strengths are equal {wi = w"'^^ = Wj™) if the network is 
undirected, thus recovering the standard definition of strength. Furthermore, if the 
network is unweighted and undirected, Wi represents the degree of the i-th node, i.e. the 
number of edges attached to it, and w is the total number of links of the network. 

The challenge of optimizing the modularity has deserved many efforts from 
the scientific community in the recent years. Provided the problem is NP-hard, 
only optimization heuristics have been shown to be competent in finding sub- 
optimal solutions of Q in feasible computational time. Nevertheless, when facing the 
decomposition in communities of very large networks, optimality is usually sacrificed in 
favor of computational time. 

Our goal here is to demonstrate that it is possible to reduce the size of complex 
networks while preserving the value of modularity, independently on the partition under 
consideration. The systematic use of this reduction allows for a more exhaustive search 
of the partitions' space that usually ends in improved values of modularity compared 
to those obtained without using this size reduction. The paper is organized as follows: 
In the next section we present the basics for the size reduction process. After that, 
we provide analytic proofs for specific reductions. Finally we exploit the reduction 
process based on the mentioned properties, and compare the modularity results with 
those obtained without size reduction in several real networks, using the Extremal 
Optimization heuristics 0. 

2. Size reduction preserving modularity 
2.1. Reduced graph 

Let G be a weighted complex network of size N, with weights Wij > 0, i, j G {1, . . . , A^}. 
If the network is unweighted, the weights matrix becomes the usual connectivity matrix, 
with values 1 for connected pairs of nodes, zero otherwise. We will assume that the 
network may be directed, i.e. represented by a non symmetric weights' matrix. 

Any grouping of the A^ nodes of the complex network G in A^' parts may be 
represented by a surjective function R : {1, . . . , A^} — > {1, . . . , A^'} which assigns a 
group index i?j = R{i) to every i-th node in G. The reduced network G' in which each 
of these groups is replaced by a single node may be easily defined in the following way: 
the weight w[.g between the nodes which represent groups r and s is the sum of all the 
weights connecting vertices in these groups, 

w^:. = EE^^.'^(^^'05(^.,s), r,se{l,...,N'} (2.1) 

i j 

where the sums run over all the A^ nodes of G. For unweighted networks the value of 
wig is just the number of arcs from the first to the second group of nodes. It must be 
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emphasized that a node r of the reduced network G' acquires a self-loop if w'^^ ^ 0, 
which summarizes the internal connectivity of the nodes of G forming this group. 
The input and output strengths of the reduced network G' are 

= E = E E ^^A^^^ r) E m, = E wr'S{R., r) , (2.2) 

s i j s i 

nj'r = E = E E ^^AR^^ s) E m, = e w/^5{r,, s) , (2.3) 

r j i r j 

and its total strength 2w' is equal to the total strength 2w of the original network 

2w' = E wr' = E = E = E = 2^ • (2-4) 

r s i j 

2.2. Modularity preservation 

The main property of the reduced network is the preservation of modularity (11. ip or 
( 11. 2p . i.e. the modularity of any partition of the reduced graph is equal to the modularity 
of its corresponding partition of the original network. 

More precisely, let C' : {1, . . . , A^'} — > {1, . . . , M} be a partition in M clusters 
of the reduced network G' . Its corresponding partition G : {1, . . . , A^} — > {1, . . . , M} 
of the original graph is given by the composition of the reducing function R with the 
partition C", i.e. G = G' o R. Therefore, the statement of the previous paragraph 
becomes 

Q'iC) = Q{G) . (2.5) 
The proof is straightforward: 

1 / / out / in \ 

= ^EE (|EE^^.W,r)5(i?„.) 

- ^E^^°'^'^(^-OE^/"^(^.'^) I S{Gl,G',) 
= ^ E E (^^. - ^^^'^) E E r)SiR„ s)S{Gl, Gi) 

I 3 \ / 

= 2^^^r- — 2^j ^^^"^^-^ 

= QiG) (2.6) 

We have found a relevant property of modularity namely that those nodes forming a 
community in the optimal partition can be represented by a unique node in the reduced 
network. Each node in the reduced network summarizes the information necessary for 
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the calculation of modularity in its self-loop (that accounts for the intraconnectivity of 
the community) and its arcs (that account for the total strengths with the rest of the 
network). The question now is: how to determine which nodes will belong to the same 
community in the optimal partition, before this partition is obtained? The answer will 
provide with a size reduction method in complex networks preserving modularity. 



3. Analytic reductions 

Here we give the proof for certain possible analytic size reductions of weighted networks, 
undirected and directed. 



3.1. Reductions for undirected networks 

The modularity of an undirected network may be written as 

Q = E?i' (3-1) 

i 

where 

9. = ^i:('"«-^)*W.Q) (3.2) 

is the contribution to modularity of the i-th node. If we allow this node to change of 
community, the value of Cj becomes a parameter, and therefore it is useful to define 

%r = ^ E (^iJ - ' = , (3.3) 

which accounts for the contribution of the i-th node to modularity if it were in 
community r. The separation of the self-loop term, which does not depend on which 
community node i belongs to, yields to the definition of 

^i:r = [Wij- ) 5(Q, r) , qi = Qi^c, (3.4) 

and 



2w .Tf... V 2w 



Q-H^^-^miU- ^) r) , (3.5) 



satisfying 



and 



1 / 

%r = %r + ^ 1^^.. - ^ J (3.6) 

Q = Q + ^E(-.-g)- (3.7) 

The role of these individual node contributions to modularity becomes evident in 

the expression of the change of modularity when node i goes from community r to 
community s: 

AQ = 2{qi,s - qi,r) ■ (3.8) 
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As a particular case, a node that forms its own community, i.e. an isolated node i, which 
moves to any community s produces a change in modularity 

Ag = 2qi^s . (3.9) 

Therefore, if there exists a community s for which <, > 0, node i cannot be isolated in 
the partition of optimal modularity. This existence is easily proved by considering the 
sum of Qi^r for all communities: 



1 ■s-^ / WjWi 



2w \ 2w 



1 ^ / _ WjWj 

2w k"^'' 2w 



— y 

1 

2^ 




(3.10) 



where we have made use of the definitions of strength Wi and total strength 2w for the 
simplification of the expression. Thus, 

ifw^n< 77^ ^ E^*,->0 ^ 3s:gi,, >0, (3.11) 
2w ^ 

completing the proof that there are no isolated nodes in the configuration which 
maximizes modularity, unless they have a big enough self-loop 

Now, it remains the problem of the determination of an acquaintance (node j) of 
node i in its optimal community, in order to group them (i?j = R.^ in a single equivalent 
node with a self-loop, as explained above. If we know that nodes i and j share the same 
community at maximum modularity, the reduced network will be equivalent to the 
original one as regards modularity: no information lost, and a smaller size. Taking into 
account that the sign of the qi^^ can only be positive if there is a link between node i 
and another node in community r, the only candidates to be the right acquaintance of 
any node are its neighbors in the network. 

The simplest particular cases are hairs, i.e. nodes connected to the network with 
only one link. Hence, a hair can be analytically grouped with its neighbor k if 

wu< —, (3.12) 
2w ^ ^ 

producing a self-loop for node k of value 

w'^i, = Wii + 2wik . (3.13) 

When node i has no self-loop {wu = 0) this condition is always fulfilled, see figure [T^. 

§ Note that some authors [T3] have used the fact that no isolated nodes are obtained at the partition 
of maximum modularity to reduce the network size, simply by obviating the existence of these nodes. 
This approach clearly fails to reproduce the same modularity of the original network and provides 
misleading results, it should be avoided. 




Figure 1. Analytic reductions for undirected networks. In (a) example of a hair 
reduction, (b) example of a triangular hair reduction (see text for details). The 
widespread case of unweighted networks, all weights equal to 1, implies that in the 
reduction (a), w^^, — 2, and in the reduction (b), — 2 and w^^, — 2. 



Another solvable structure is the triangular hair, in which two nodes i and j have 
only one link connecting them, two more links from i and j to a third node k, and 
possibly self-loops. In this case, if 

Wii < — and < (3.14) 

nodes i and j share the same community in the optimal partition and therefore may be 
grouped as a single node h. Moreover, the resulting structure becomes a simple hair, 
which can be grouped with node k if 

'2 



111 



where 



In the particular case of nodes i and j without self-loops (wa = wjj = 0), the triangular 
hair can always be reduced to a single hair with a self- loop w'f^f^ = 2wij, see figure [Tb- 



wi = Wi + Wj = w'f^i^ + w't^f,. (3.16) 
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3.2. Reductions for directed networks 

The treatment of directed networks requires the distinction between the nodes' output 
and input contributions to modularity: 

Q = EC* = EC' (3-17) 

i 3 

where 

C-i^Y.[^^^-^i^)^iC,,r), gr = Ca' (3-18) 

C = ^E(^.-^^J'^(^^.,0, €-^cr (3.19) 

The separation of the self-loop term follows the same pattern than for undirected 
networks: 

C = ^ E - ^{C,, r) , qr - Ca > (3-20) 

C = ^ E - ) m, r) , gf = qtc. , (3.21) 



and 



satisfying 



= Er' = E^^ (3-22) 



1 / w°'^^w'^\ 
C* = C' + ^(»«--^). (3.23) 

1 / 



and 



« = «+^i:(-«--^). (3.25) 

With these definitions at hand, the change of modularity when node i goes from 
community r to community s becomes 

Ag = (C + C) - (C + C) ' (3-26) 
and the change when an isolated node i moves to any community s is 

AQ = C + € ■ (3-27) 

The first difference between directed and undirected networks comes from the fact 
that we cannot prove this time the inexistence of isolated nodes in the partition of 
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optimal modularity. The previous argumentation was based on the use of f l3.1UI) . which 
now splits in two relationships: 

\%r ' (3-28) 



The next step is the same: 



,,out„,,in 



if wu < E C > ^ : g-* > , (3.30) 



^out in 



ifii^..< E^'">0 3s2:gZ,>0. (3.31) 

Since communities Si and S2 need not be the same, the change of modularity (13.271) is 
not warranted to be positive, and thus isolated nodes are possible in the partition which 
maximizes modularity. 

Nevertheless, there exist three kinds of nodes for which we can prove they cannot 
be isolated in the optimal partition, provided their self-loops are not too large: hairs, 
sinks (nodes with only input links) and sources (nodes with only output links). 

Directed hairs, i.e. nodes connected only to another node, either through an input, 
an output, or both links, necessarily have si = S2- Therefore, it is save to group them 
in the same way as undirected hairs if 

out in 

wu< ^V^- (3.32) 
2w 

In particular, this condition is always fulfilled if the hair has no self-loop {wu = 0), see 
figure [2^. Whenever the self-loop is present, both input and output links are needed to 
counterbalance it. The resulting self- loop w^^, of the grouped node has value 

"^'kk = ^ii + "^ik + Wki ■ (3.33) 

Sink nodes i are characterized by null output strengths, w°^^ = 0, which imply 
g""* = for all communities r. Thus, the change of modularity 03.271) only depends on 
the value of g™^, and (13.311) tells us that they can always be grouped with an increase 
of modularity. The same property applies to sources, which are defined as nodes with 
null input strengths, = 0. Note that sinks and sources cannot have self-loops, since 
this would be in contradiction with their null output and input strengths respectively. 

A triangular hair formed by a source node i and a sink node j behaves exactly as 
the undirected triangular hair, being possible to group them in a single node h with a 
self-loop, see figure [2)d, where 

W'hh = Wij , 
W'hk = Wik , 

w'kh = Wkj ■ (3.34) 
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(a) 




Figure 2. Analytic reductions for directed networks. In (a) example of a hair 
reduction, (b) example of a triangular hair reduction (see text for details) 

4. Results and discussion 

The above proofs allow us to face the problem of size reduction in complex networks 
into a firm basis. In particular, this size reduction preserving modularity ensures that 
the structural mesoscale found by maximizing modularity will be invariant under these 
transformations. The natural question at this point is: what is the percentage in size 
reduction of networks using the previous rules? To answer this question it is mandatory 
to have an estimation on the number of hairs, and triangular hairs, we might expect 
in complex networks. In real networks this calculation can be performed by direct 
enumeration, however an estimation can be made in terms of general grounds about the 
degree distribution P{k). 

Here we provide some rough estimates for the most widespread degree distributions 
in natural and artificial networks: scale-free and exponential. For scale-free networks 
it is usually assumed a P{k) = ak~^ , with 7 e [2, 3] for most of the real scale-free 
complex networks. The normalization condition provides with the value of a. As a first 
approximation, neglecting the structural cut-off of the network, we can write 
00 

«E^'^ = «C(7) = 1 (4.1) 

k=l 

where (^(7) is the Dirichlet series representation of the Riemman zeta function. For 
values of 7 G [2,3] we obtain a G [1/^(2), 1/^(3)] ~ [0.61,0.83]. That means that, 
roughly speaking, the number of hairs that corresponds to -P(l) is about 83% of nodes 
in a scale- free network with 7 = 3 and 61% when 7 = 2, although this value is slightly 
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Table 1. Results for the optimal partition obtained using EO algorithm 8] for several 
real networks before and after applying the size reduction. We present the number 
of nodes, modularity, number of communities and speed-up of the algorithm after 
reduction. 



Network 


N 


Q 


^ communities 


speed-up 


Zachary 


34 


0.419790 


4 


— 


Zachary- reduced 


33 


0.419790 


4 


1.00 


Jazz 


198 


0.444469 


4 


— 


Jazz-reduced 


193 


0.445144 


4 


1.00 


E-mail 


1133 


0.580070 


10 


— 


E-mail-reduced 


981 


0.581425 


10 


1.17 


Airport s-U 


3618 


0.706704 


25 




Airport s-U-reduced 


2763 


0.707076 


24 


1.68 


Airport s-WU 


3618 


0.649268 


29 




Airport s-WU- reduced 


2763 


0.649337 


29 


1.68 


Airports- WD 


3618 


0.649189 


34 




Airport s-WD-reduced 


2880 


0.649286 


30 


1.53 


PGP 


10680 


0.876883 


118 




PGP-reduced 


6277 


0.880244 


101 


4.27 


AS(2001) 


11174 


0.619048 


25 




AS(2001)-reduced 


7386 


0.628004 


31 


2.41 


AS(2006) 


22963 


0.645942 


25 




AS(2006)-reduced 


15118 


0.658198 


45 


2.39 



reduced when considering the cut-offs of the real distributions. 

An equivalent estimate can be conducted for exponential degree distributions of 
type P{k) = ae~^^, with f3 > 0. In this case, normalization implies that 

«Ee-^' = «T^ = ^ (4.2) 
t^i 1-e 

and then a = — 1. The percentage of hairs in this case is -P(l) = 1 — e~^, that, 
for example, for plausible values of /3 G [0.5, 1.5] provides a reduction between 40% and 
77% respectively. 

At the light of these estimates, the size reduction process provides with an 
interesting technique to confront the analysis of community structure in networks by 
maximizing modularity with a substantial advantage in computational cost without 
sacrificing any information. We have checked our size reduction process, and posterior 
optimization of modularity using Extremal Optimization (EO) [8] in several real 
networks. To enhance the accuracy of the EO algorithm, we perform a last step of 
optimization consisting in to merge communities whenever modularity is increased, and 
rearrange the borders (moving the nodes with the lowest modularity values and testing 
them in the neighbor communities) until all the nodes are better classified and no higher 
modularities, by moving one node, can be obtained. The results obtained improve those 
obtained using Spectral optimization [H] and simulated annealing 
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The networks analyzed are: the Zachary's karate club network p!3], the Jazz 
musicians network [16], the e-mail network of the University Rovira i Virgili |17j . 
the airports network with data about passenger flights operating in the time period 
November 1, 2000, to October 31, 2001 compiled by OAG Worldwide (Downers Grove, 
IL) and analyzed in |18j, the network of users of the PGP algorithm for secure 
information transactions [19], and the Internet network at the autonomous system (AS) 
level as it was in 2001 and 2006 reconstructed from BGP tables posted by the University 
of Oregon Route Views Project. The results obtained are reported in Table 1. 

We observe that the reduction process allows for a more exhaustive search of the 
partitions' space as expected. The speed-up of the algorithm after reduction gives 
an indication of the effectiveness of the process. This is also corroborated by an 
improvement in modularity. We present in Table 1 the values of modularity for the 
different networks analyzed up to order 10~^. In general, the numerical resolution of 
modularity is up to order minj{u7j}/2w, that represents the minimal possible change in 
the structure of the partitions. It means that every digit in our value of modularity is 
significant for comparison purposes. 

Particularly illustrative is the analysis of the airport network. We have constructed 
different networks from the raw data, the undirected unweighted network previously 
used in [18], the undirected weighted network (where the weights reflects the number 
of passengers using the connection in the period of study), and the most realistic case 
corresponding to the weighted directed network of the airports connections. These 
networks allowed us to check our techniques (reduction and optimization algorithm) 
in all the possible scenarios. Note that the results obtained for the weighted directed 
and undirected networks in terms of modularity are very close, an explanation about 
this fact that is ubiquitous in the analysis of directed networks can be found in the 
Appendix. 

Summarizing, we have proposed an exact procedure for size reduction in complex 
networks preserving modularity. The direct consequence of its application is an 
improvement in computational cost, and then accuracy, of any heuristics designed 
to optimize modularity. We think that the idea of the exact reduction could be 
extended to other specific motifs (building blocks) in the network, although its analytical 
treatment can be further difficult. The reduced network is also an appealing concept 
to renormalize dynamical processes in complex networks (in the sense of real space 
renormalization). With this reduction it is plausible to perform a coarse graining of the 
dynamic interactions between the formed groups, we will explore this connection in a 
future work. 
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Appendix A. Relationship between directed and undirected modularities 

Let us suppose that wij are the weights of a directed weighted network, and that we 
define its corresponding symmetrized (undirected) network by adding the weights matrix 
to its transpose: 

Wij = Wij + Wji , Vi, j . (A.l) 
The strengths of this undirected network are 

= + (A.2) 
and the total strength is 

2w = 4w . (A.3) 

The modularity Qd of the directed network is invariant under transposition of the 
weights matrix since the input (output) strengths of the transposed network are equal 
to the output (input) strengths of the original one: 

= ^EEU.-^^W.C,). (A.4) 

The relationship between the modularity of the directed network and the 
modularity Qs of its symmetrization is obtained by simple calculations: 

1 ^r^^r^ / WiltJi 



» 3 

.out I „,,in\ /^..out 



- E E + ^ KCi. Cj) 



» 3 



1 



(4w)2 



y y{wr - <)«* - 0<^(Q, c,) 



Qd -j^,y E«"' - <) « - Cj) . (A.5) 



This result can also be expressed as a communities sum: 

Qs^Qn-j^.y {y{wr' - <)<^(Q, r)) . (A.G) 

The contribution of the links to the input and output strengths cancel if they fall 
within the communities. Therefore, if most links do not cross the boundaries of the 
communities, it follows that Qs ^ Qd even if the network is highly asymmetric. 
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